Unbiased Concurrent Evaluation on a Budget
ثبت نشده
چکیده
Eliciting expert judgments for evaluating the performance of structured prediction systems (e.g., search engines, recommender systems) is labor-intensive and costly. This raises the question of how to get high-quality performance estimates given a relatively small budget of judgments. In this paper, we provide theoretically justified – yet highly practical and efficient – methods for selecting a subset of items to judge, addressing four commonly encountered evaluation scenarios: estimating the performance of a single system, comparing a system to a benchmark system, and estimating the relative and absolute performances of k systems. Our approach is based on designing effective importance sampling estimators that give provably unbiased performance estimates and that can re-use previously collected judgments. To this effect, we exploit the fact that many performance metrics of interest can be expressed as an expectation and derive theoretically optimal importance sampling distributions for estimating this expected value. We empirically demonstrate the effectiveness of our framework, showing that it eliminates the bias inherent in deterministic selection strategies (e.g. the pooling method) and that it can drastically reduce the number of judgments necessary compared to naive sampling approaches.
منابع مشابه
Unbiased Ranking Evaluation on a Budget
We address the problem of assessing the quality of a ranking system (e.g., search engine, recommender system, review ranker) given a fixed budget for collecting expert judgments. In particular, we propose a method that selects which items to judge in order to optimize the accuracy of the quality estimate. Our method is not only efficient, but also provides estimates that are unbiased — unlike c...
متن کاملEvaluation of Eight Evaporation Estimation Methods in a Semi-arid Region (Dez reservoir, Iran)
Establishing satisfactory estimation methods of lake evaporation has been crucial vital for research and management of water resources and ecosystems. Determining the accurate method to estimate evaporation from reservoirs in the investigation and management of water resources is very important. Hence, in this study eight empirical methods such as; Makkink, DeBruin-Kejiman, Penman, Priestley-Ta...
متن کاملEvaluation of Eight Evaporation Estimation Methods in a Semi-arid Region (Dez reservoir, Iran)
Establishing satisfactory estimation methods of lake evaporation has been crucial vital for research and management of water resources and ecosystems. Determining the accurate method to estimate evaporation from reservoirs in the investigation and management of water resources is very important. Hence, in this study eight empirical methods such as; Makkink, DeBruin-Kejiman, Penman, Priestley-Ta...
متن کاملThe effect of the budget slack creation and budget internal control by managers on maximization of utility function in budgetary participation
When evaluating of the senior manager’s performance is based on the achieving to budget and they have responsibility to report the capacity of resources which are in their part too; it is possible to create budget slack and effect on their performance evaluation by providing pessimistic and conservatively estimates or manipulated information of income and expenses. So, Senior Manager and Budget...
متن کاملAn evaluation of the systematic relation between Energy, Economy, Environment (E3); a case study of the MENA countries
The way Energy, Economy and Environment systems correlate along with the appropriate policy will bring about a countries sustainable growth and economic development (E3 Models). The models deals with the relation between these aspects are generalized versions of economic growth models which take into account the energy and environmental factors. This research analyses concurrent efffect of ene...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015